A Profiling Method for Analyzing Scalability Bottlenecks on Multicores
نویسندگان
چکیده
A key goodness metric of multi-threaded programs is how their execution times scale when increasing the number of threads. However, there are several bottlenecks that can limit the scalability of a multi-threaded program, e.g., contention for shared cache capacity and off-chip memory bandwidth; and synchronization overheads. In order to improve the scalability of a multi-threaded program, it is vital to be able to quantify how the program is impacted by these scalability bottlenecks. We present a software-only profiling method for obtaining speedup stacks. A speedup stack reports how much each scalability bottleneck limits the scalability of a multi-threaded program. It thereby quantifies how much its scalability can be improved by eliminating a given bottleneck. A software developer can use this information to determine what optimizations are most likely to improve scalability, while a computer architect can use it to analyze the resource demands of emerging workloads. The proposed method profiles the program on real commodity multi-cores (i.e., no simulations required) using existing performance counters. Consequently, the obtained speedup stacks accurately account for all idiosyncrasies of the machine on which the program is profiled. While the main contribution of this paper is the profiling method to obtain speedup stacks, we present several examples of how speedup stacks can be used to analyze the resource requirements of multi-threaded programs. Furthermore, we discuss how their scalability can be improved by both software developers and computer architects.
منابع مشابه
Toward Scalable Transaction Processing
Designing scalable transaction processing systems on modern multicore hardware has been a challenge for almost a decade. The typical characteristics of transaction processing workloads lead to a high degree of unbounded communication on multicores for conventional system designs. In this tutorial, we initially present a systematic way of eliminating scalability bottlenecks of a transaction proc...
متن کاملNon-Uniform HEVC Tile Partitioning Method for Asymmetric Multicores
This paper proposes a novel high efficiency video coding (HEVC) Tile partitioning method for the parallel processing by analyzing the computing ability of asymmetric multicores. The proposed method (i) analyzes the computing ability of asymmetric multicores and (ii) makes the regression model of computational complexity per video resolutions. Finally, the model (iii) determines the optimal HEVC...
متن کاملA Methodology for Accurate, Effective and Scalable Performance Analysis of Application Programs
We describe a unique and comprehensive methodology for accurately measuring and effectively analyzing the performance of an application’s execution. This methodology is 1) accurate, because it assiduously avoids systematic measurement error (such as that introduced by instrumentation); 2) effective, because it associates useful performance metrics (such as memory bandwidth) with important sourc...
متن کاملCharacterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms
Clouds have become attractive to applications, because of its low cost and on-demand computing model with the use of virtualization technologies. With the continual increasing number of cores per chip, it should be an emergence to study and improve the scalability of virtualized platforms. This paper tries to make a study on the horizontal scalability 1 of a set of parallel applications on virt...
متن کاملProfiling Distributed File Systems with Computer Animation
Achieving performance, reliability, and scalability has proven difficult for distributed file systems. Placement of data, load distribution and other overheads are often the culprits. Profiling is a useful technique for understanding file system behavior, improving performance and debugging problems. Existing file system profiling methods often examine fine-grained system activity, such as the ...
متن کامل